Inizio contenuto principale del sito

  • Istituto di Economia
  • Seminario

Valid asymptotic inference with heterogenous clusters

Data 10.09.2019 orario
Indirizzo

Piazza Martiri della Libertà, 33 , 56127 Italia

Back to Sant'Anna Magazine

The Institute of Economics will hold the next meeting of its Seminar Series on Tuesday, September 10, 2019: Giuseppe Ragusa, from the University of Pisa, will present the paper Valid asymptotic inference with heterogenous clusters.

 

Abstract

Empirical analysis often focuses on the parameters of linear models that are estimated on data whose sample or experimental design leads to intra-cluster correlation in the regressors and the error. Inference methods that account for the clustering of observations have been available since Moulton (1990) showed the pitfalls in failing to do so. Using cluster-robust standard errors in empirical studies is nowadays a routine practice in economics, finance, sociology, political science, and other fields. An analysis of publications in top economic journals focusing on microeconomic empirical research reveals that in 2000 only 14 papers dealt with clusters using robust standard errors. That number had risen to 71 in 2006 and 168 in 2012.The standard way to account for clusters when carrying out inference about parameters of linear models is to use a cluster-robust variance estimator. This paper shows that the OLS estimator may fail to converge at the usual square-root rate if clusters are too unequal in size. Whetherthe standard asymptotics holds hinges on the limiting behavior of a quantity that only depends on cluster sizes. We introduce a simple weighted OLS estimator whose distribution converges atthe usual rate independently of cluster heterogeneity. The estimator weights the observations by the inverse of the sample size of the cluster, and it is thus straightforward to implement. We also show that the weighted estimator is more efficient than OLS even when cluster sizes differ moderately, with a relative efficiency that is proportional to the measure of heterogeneity. Monte Carlo simulations confirm that the new estimator performs well in samples and show that for typical empirical problems, the efficiency gain can be significant. These results hold also in a big data setting where the number of variables of the model are allowed to grow to infinity.  Finally, findings extend to the instrumental variables estimators.